FRASH: A framework to test algorithms of similarity hashing
نویسندگان
چکیده
منابع مشابه
FRASH: A framework to test algorithms of similarity hashing
Automated input identification is a very challenging, but also important task. Within computer forensics this reduces the amount of data an investigator has to look at by hand. Besides identifying exact duplicates, which is mostly solved using cryptographic hash functions, it is necessary to cope with similar inputs (e.g., different versions of a file), embedded objects (e.g., a JPG within a Wo...
متن کاملSparse similarity-preserving hashing
In recent years, a lot of attention has been devoted to efficient nearest neighbor search by means of similarity-preserving hashing. One of the plights of existing hashing techniques is the intrinsic trade-off between performance and computational complexity: while longer hash codes allow for lower false positive rates, it is very difficult to increase the embedding dimensionality without incur...
متن کاملHashing for Similarity Search: A Survey
Similarity search (nearest neighbor search) is a problem of pursuing the data items whose distances to a query item are the smallest from a large database. Various methods have been developed to address this problem, and recently a lot of efforts have been devoted to approximate search. In this paper, we present a survey on one of the main solutions, hashing, which has been widely studied since...
متن کاملSelection of Hashing Algorithms
INTRODUCTION The National Software Reference Library (NSRL) Reference Data Set (RDS) is built on file signature generation technology that is used primarily in cryptography. The selection of the specific file signature generation routines is based on customer requirements and the necessity to provide a level of confidence in the reference data that will allow it to be used in the U.S. Courts. T...
متن کاملWaldHash: sequential similarity-preserving hashing
Similarity-sensitive hashing seeks compact representation of vector data as binary codes, so that the Hamming distance between code words approximates the original similarity. In this paper, we show that using codes of fixed length is inherently inefficient as the similarity can often be approximated well using just a few bits. We formulate a sequential embedding problem and approach similarity...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Digital Investigation
سال: 2013
ISSN: 1742-2876
DOI: 10.1016/j.diin.2013.06.006